Detecting outliers in high-dimensional neuroimaging datasets with robust covariance estimators
نویسندگان
چکیده
Medical imaging datasets often contain deviant observations, the so-called outliers, due to acquisition or preprocessing artifacts or resulting from large intrinsic inter-subject variability. These can undermine the statistical procedures used in group studies as the latter assume that the cohorts are composed of homogeneous samples with anatomical or functional features clustered around a central mode. The effects of outlying subjects can be mitigated by detecting and removing them with explicit statistical control. With the emergence of large medical imaging databases, exhaustive data screening is no longer possible, and automated outlier detection methods are currently gaining interest. The datasets used in medical imaging are often high-dimensional and strongly correlated. The outlier detection procedure should therefore rely on high-dimensional statistical multivariate models. However, state-of-the-art procedures, based on the Minimum Covariance Determinant (MCD) estimator, are not well-suited for such high-dimensional settings. In this work, we introduce regularization in the MCD framework and investigate different regularization schemes. We carry out extensive simulations to provide backing for practical choices in absence of ground truth knowledge. We demonstrate on functional neuroimaging datasets that outlier detection can be performed with small sample sizes and improves group studies.
منابع مشابه
Mammalian Eye Gene Expression Using Support Vector Regression to Evaluate a Strategy for Detecting Human Eye Disease
Background and purpose: Machine learning is a class of modern and strong tools that can solve many important problems that nowadays humans may be faced with. Support vector regression (SVR) is a way to build a regression model which is an incredible member of the machine learning family. SVR has been proven to be an effective tool in real-value function estimation. As a supervised-learning appr...
متن کاملDetecting Outlying Subjects in High-Dimensional Neuroimaging Datasets with Regularized Minimum Covariance Determinant
Medical imaging datasets used in clinical studies or basic research often comprise highly variable multi-subject data. Statistically-controlled inclusion of a subject in a group study, i.e. deciding whether its images should be considered as samples from a given population or whether they should be rejected as outlier data, is a challenging issue. While the informal approaches often used do not...
متن کاملA Two-Phase Robust Estimation of Process Dispersion Using M-estimator
Parameter estimation is the first step in constructing any control chart. Most estimators of mean and dispersion are sensitive to the presence of outliers. The data may be contaminated by outliers either locally or globally. The exciting robust estimators deal only with global contamination. In this paper a robust estimator for dispersion is proposed to reduce the effect of local contamination ...
متن کاملRobust Estimation in Linear Regression with Molticollinearity and Sparse Models
One of the factors affecting the statistical analysis of the data is the presence of outliers. The methods which are not affected by the outliers are called robust methods. Robust regression methods are robust estimation methods of regression model parameters in the presence of outliers. Besides outliers, the linear dependency of regressor variables, which is called multicollinearity...
متن کاملA Robust Method for Detecting DB-Outliers from High Dimensional Datasets
Outlier detection is a popular technique that can be utilized in many modern applications like financial analysis and fraud detection. As data description becomes complex, operated datasets’ dimensionalities keep monotone increasing. However, current researches find that it is extremely difficult to pick out outliers directly from high dimensional datasets owing to the curse of dimensionality. ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Medical image analysis
دوره 16 7 شماره
صفحات -
تاریخ انتشار 2012